Using Genres to Improve Search Engines
نویسندگان
چکیده
Modern search engines are typically queried with keywords, which foremostly convey the topic of the sought web page. Consequently the resulting top hits are often topically relevant, but nonetheless not what the user wants. The premise of this paper is that the relevance of the hits can be improved when also searching by genre, classification criterion orthogonal to topic. To this end a genre classifier was built using machine learning methods. It was used in web page retrieval to filter out the hits not belonging to the desired genre. This approach considerably improved the relevance of the top ten hits, which indicates that genre classifier can be a useful addition to search engines.
منابع مشابه
Genre Classification of Web Documents
Retrieving relevant documents over the Web is an overwhelming task when search engines return thousands of Web documents. Sifting through these documents is time-consuming and sometimes leads to an unsuccessful search. One problem is that most search engines rely on matching a query to documents based solely on topical keywords. However, many users of search engines have a particular genre in m...
متن کاملارزیابی خودکار جویشگرهای ویدئویی حوزه وب فارسی بر اساس تجمیع آرا
Today, the growth of the internet and its high influence in individuals’ life have caused many users to solve their daily needs by search engines and hence, the search engines need to be modified and continuously improved. Therefore, evaluating search engines to determine their performance is of paramount importance. In Iran, as well as other countries, extensive researches are being performed ...
متن کاملAutomatic Classification of Musical Artists based on Web-Data
The organization of music is one of the central challenges in times of increasing distribution of digital music. A well-tried means is the classification in genres and/or styles. In this paper we propose the use of text categorization techniques to classify artists present on the Internet. In particular, we retrieve and analyze webpages ranked by search engines to describe artists in terms of w...
متن کاملThesis Stereotyping the Web: Genre Classification of Web Documents
OF THESIS STEREOTYPING THE WEB: GENRE CLASSIFICATION OF WEB DOCUMENTS Retrieving relevant documents over the Web is a difficult task. Currently, search engines rely on keywords for matching documents to user queries. This paper explores the potential for discriminating documents based on the genre of the document. I define genre as a taxonomy that incorporates the style, form and content of a d...
متن کاملSemantic Web Tools for Categorization Greek Texts on the Internet: the MeDa13 standard and TeGO ontology
The wider question of this study is the suitability of existing Web search engines for the needs of school education. It examines the relevance to the teaching objectives of the results returned by the search process given a query and its (stated or unstated) purpose in the context of an educational activity. The particular field of teaching and research interest is Modern Greek in Cypriot seco...
متن کامل